Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells406
Missing cells (%)5.9%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

Numeric8
Categorical1

Alerts

Outcome has constant value "0.0" Constant
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with BMIHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies is highly correlated with AgeHigh correlation
SkinThickness is highly correlated with Insulin and 1 other fieldsHigh correlation
Insulin is highly correlated with SkinThicknessHigh correlation
BMI is highly correlated with SkinThicknessHigh correlation
Age is highly correlated with PregnanciesHigh correlation
Pregnancies has 23 (3.0%) missing values Missing
Glucose has 16 (2.1%) missing values Missing
BloodPressure has 16 (2.1%) missing values Missing
SkinThickness has 19 (2.5%) missing values Missing
Insulin has 16 (2.1%) missing values Missing
BMI has 16 (2.1%) missing values Missing
DiabetesPedigreeFunction has 16 (2.1%) missing values Missing
Age has 16 (2.1%) missing values Missing
Outcome has 268 (34.9%) missing values Missing
Pregnancies has 111 (14.5%) zeros Zeros

Reproduction

Analysis started2022-09-18 11:50:57.772984
Analysis finished2022-09-18 11:51:14.093898
Duration16.32 seconds
Software versionpandas-profiling v3.2.0
Download configurationconfig.json

Variables

Pregnancies
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING
ZEROS

Distinct12
Distinct (%)1.6%
Missing23
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.563758389
Minimum0
Maximum11
Zeros111
Zeros (%)14.5%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:14.251595image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile9
Maximum11
Range11
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.00296034
Coefficient of variation (CV)0.8426385887
Kurtosis-0.5688999247
Mean3.563758389
Median Absolute Deviation (MAD)2
Skewness0.6845704539
Sum2655
Variance9.017770802
MonotonicityNot monotonic
2022-09-18T17:21:14.415745image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1135
17.6%
0111
14.5%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
Other values (2)35
 
4.6%
ValueCountFrequency (%)
0111
14.5%
1135
17.6%
2103
13.4%
375
9.8%
468
8.9%
557
7.4%
650
 
6.5%
745
 
5.9%
838
 
4.9%
928
 
3.6%
ValueCountFrequency (%)
1111
 
1.4%
1024
 
3.1%
928
 
3.6%
838
 
4.9%
745
5.9%
650
6.5%
557
7.4%
468
8.9%
375
9.8%
2103
13.4%

Glucose
Real number (ℝ≥0)

MISSING

Distinct129
Distinct (%)17.2%
Missing16
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean120.1070115
Minimum44
Maximum191
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:14.594831image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum44
5-th percentile80
Q199
median117
Q3139
95-th percentile176
Maximum191
Range147
Interquartile range (IQR)40

Descriptive statistics

Standard deviation28.75511625
Coefficient of variation (CV)0.2394124696
Kurtosis-0.3477252184
Mean120.1070115
Median Absolute Deviation (MAD)19
Skewness0.4459438134
Sum90320.47266
Variance826.8567103
MonotonicityNot monotonic
2022-09-18T17:21:14.790629image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10017
 
2.2%
9917
 
2.2%
12914
 
1.8%
12514
 
1.8%
11114
 
1.8%
10614
 
1.8%
10513
 
1.7%
10213
 
1.7%
11213
 
1.7%
10813
 
1.7%
Other values (119)610
79.4%
(Missing)16
 
2.1%
ValueCountFrequency (%)
441
 
0.1%
561
 
0.1%
572
0.3%
611
 
0.1%
621
 
0.1%
651
 
0.1%
671
 
0.1%
683
0.4%
714
0.5%
721
 
0.1%
ValueCountFrequency (%)
1911
 
0.1%
1901
 
0.1%
1894
0.5%
1882
 
0.3%
1874
0.5%
1861
 
0.1%
1843
0.4%
1833
0.4%
1821
 
0.1%
1815
0.7%

BloodPressure
Real number (ℝ≥0)

MISSING

Distinct39
Distinct (%)5.2%
Missing16
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean71.51820666
Minimum24
Maximum98
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:14.973531image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum24
5-th percentile52
Q164
median72
Q380
95-th percentile90
Maximum98
Range74
Interquartile range (IQR)16

Descriptive statistics

Standard deviation11.0988587
Coefficient of variation (CV)0.1551892759
Kurtosis0.4352308771
Mean71.51820666
Median Absolute Deviation (MAD)8
Skewness-0.2714603482
Sum53781.69141
Variance123.1846645
MonotonicityNot monotonic
2022-09-18T17:21:15.156423image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%)
7057
 
7.4%
7452
 
6.8%
6845
 
5.9%
7845
 
5.9%
7244
 
5.7%
6443
 
5.6%
8040
 
5.2%
7639
 
5.1%
6037
 
4.8%
69.1054687535
 
4.6%
Other values (29)315
41.0%
ValueCountFrequency (%)
241
 
0.1%
302
 
0.3%
381
 
0.1%
401
 
0.1%
444
 
0.5%
462
 
0.3%
485
 
0.7%
5013
1.7%
5211
1.4%
5411
1.4%
ValueCountFrequency (%)
983
 
0.4%
964
 
0.5%
951
 
0.1%
946
 
0.8%
928
 
1.0%
9022
2.9%
8825
3.3%
8621
2.7%
856
 
0.8%
8423
3.0%

SkinThickness
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct41
Distinct (%)5.5%
Missing19
Missing (%)2.5%
Infinite0
Infinite (%)0.0%
Mean25.90624305
Minimum7
Maximum47
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:15.341799image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum7
5-th percentile14
Q120.53645833
median22
Q332
95-th percentile42
Maximum47
Range40
Interquartile range (IQR)11.46354167

Descriptive statistics

Standard deviation8.486654456
Coefficient of variation (CV)0.3275910923
Kurtosis-0.452408322
Mean25.90624305
Median Absolute Deviation (MAD)5
Skewness0.5744201519
Sum19403.77604
Variance72.02330386
MonotonicityNot monotonic
2022-09-18T17:21:15.528003image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
20.53645833227
29.6%
3231
 
4.0%
3027
 
3.5%
2723
 
3.0%
2322
 
2.9%
2820
 
2.6%
1820
 
2.6%
3320
 
2.6%
3119
 
2.5%
3918
 
2.3%
Other values (31)322
41.9%
(Missing)19
 
2.5%
ValueCountFrequency (%)
72
 
0.3%
82
 
0.3%
105
 
0.7%
116
0.8%
127
0.9%
1311
1.4%
146
0.8%
1514
1.8%
166
0.8%
1714
1.8%
ValueCountFrequency (%)
474
 
0.5%
468
1.0%
456
 
0.8%
445
 
0.7%
436
 
0.8%
4211
1.4%
4115
2.0%
4016
2.1%
3918
2.3%
387
 
0.9%

Insulin
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct172
Distinct (%)22.9%
Missing16
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean109.2513367
Minimum14
Maximum465
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:15.829730image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum14
5-th percentile50
Q179.79947917
median79.79947917
Q3120
95-th percentile261.15
Maximum465
Range451
Interquartile range (IQR)40.20052083

Descriptive statistics

Standard deviation66.02068868
Coefficient of variation (CV)0.6043009694
Kurtosis5.482330419
Mean109.2513367
Median Absolute Deviation (MAD)1
Skewness2.220288604
Sum82157.00521
Variance4358.731334
MonotonicityNot monotonic
2022-09-18T17:21:16.026083image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
79.79947917374
48.7%
10511
 
1.4%
1409
 
1.2%
1309
 
1.2%
1208
 
1.0%
1007
 
0.9%
1807
 
0.9%
947
 
0.9%
1356
 
0.8%
1106
 
0.8%
Other values (162)308
40.1%
(Missing)16
 
2.1%
ValueCountFrequency (%)
141
 
0.1%
151
 
0.1%
161
 
0.1%
182
0.3%
221
 
0.1%
232
0.3%
251
 
0.1%
291
 
0.1%
321
 
0.1%
363
0.4%
ValueCountFrequency (%)
4651
0.1%
4401
0.1%
4151
0.1%
4021
0.1%
3921
0.1%
3871
0.1%
3751
0.1%
3701
0.1%
3601
0.1%
3421
0.1%

BMI
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct234
Distinct (%)31.1%
Missing16
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean32.02289675
Minimum18.2
Maximum46.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:16.210298image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum18.2
5-th percentile22.2
Q127.475
median32
Q336.1
95-th percentile43.3
Maximum46.8
Range28.6
Interquartile range (IQR)8.625

Descriptive statistics

Standard deviation6.240857639
Coefficient of variation (CV)0.1948873547
Kurtosis-0.5088654669
Mean32.02289675
Median Absolute Deviation (MAD)4.4
Skewness0.1640359455
Sum24081.21836
Variance38.94830408
MonotonicityNot monotonic
2022-09-18T17:21:16.404477image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3213
 
1.7%
31.612
 
1.6%
31.212
 
1.6%
31.9925781211
 
1.4%
33.310
 
1.3%
32.410
 
1.3%
30.19
 
1.2%
32.89
 
1.2%
32.99
 
1.2%
30.89
 
1.2%
Other values (224)648
84.4%
(Missing)16
 
2.1%
ValueCountFrequency (%)
18.23
0.4%
18.41
 
0.1%
19.11
 
0.1%
19.31
 
0.1%
19.41
 
0.1%
19.52
0.3%
19.63
0.4%
19.91
 
0.1%
201
 
0.1%
20.11
 
0.1%
ValueCountFrequency (%)
46.82
0.3%
46.71
0.1%
46.51
0.1%
46.31
0.1%
46.22
0.3%
46.12
0.3%
45.81
0.1%
45.71
0.1%
45.62
0.3%
45.51
0.1%

DiabetesPedigreeFunction
Real number (ℝ≥0)

MISSING

Distinct501
Distinct (%)66.6%
Missing16
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean0.4444973404
Minimum0.078
Maximum1.39
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:16.586796image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14
Q10.23975
median0.3645
Q30.60025
95-th percentile0.968
Maximum1.39
Range1.312
Interquartile range (IQR)0.3605

Descriptive statistics

Standard deviation0.2712069586
Coefficient of variation (CV)0.61014304
Kurtosis0.6776320936
Mean0.4444973404
Median Absolute Deviation (MAD)0.1615
Skewness1.076351445
Sum334.262
Variance0.07355321437
MonotonicityNot monotonic
2022-09-18T17:21:16.786743image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.2586
 
0.8%
0.2546
 
0.8%
0.2615
 
0.7%
0.2685
 
0.7%
0.2385
 
0.7%
0.2075
 
0.7%
0.2595
 
0.7%
0.3044
 
0.5%
0.264
 
0.5%
0.2454
 
0.5%
Other values (491)703
91.5%
(Missing)16
 
2.1%
ValueCountFrequency (%)
0.0781
0.1%
0.0841
0.1%
0.0852
0.3%
0.0882
0.3%
0.0891
0.1%
0.0921
0.1%
0.0961
0.1%
0.11
0.1%
0.1011
0.1%
0.1021
0.1%
ValueCountFrequency (%)
1.391
0.1%
1.3531
0.1%
1.3211
0.1%
1.3181
0.1%
1.2921
0.1%
1.2821
0.1%
1.2681
0.1%
1.2581
0.1%
1.2511
0.1%
1.2242
0.3%

Age
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct44
Distinct (%)5.9%
Missing16
Missing (%)2.1%
Infinite0
Infinite (%)0.0%
Mean32.5
Minimum21
Maximum64
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size6.1 KiB
2022-09-18T17:21:16.975755image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q340
95-th percentile54.45
Maximum64
Range43
Interquartile range (IQR)16

Descriptive statistics

Standard deviation10.70286247
Coefficient of variation (CV)0.3293188451
Kurtosis0.05671186164
Mean32.5
Median Absolute Deviation (MAD)7
Skewness0.9741945595
Sum24440
Variance114.551265
MonotonicityNot monotonic
2022-09-18T17:21:17.160693image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
2272
 
9.4%
2163
 
8.2%
2548
 
6.2%
2446
 
6.0%
2338
 
4.9%
2835
 
4.6%
2633
 
4.3%
2732
 
4.2%
2929
 
3.8%
3124
 
3.1%
Other values (34)332
43.2%
ValueCountFrequency (%)
2163
8.2%
2272
9.4%
2338
4.9%
2446
6.0%
2548
6.2%
2633
4.3%
2732
4.2%
2835
4.6%
2929
3.8%
3021
 
2.7%
ValueCountFrequency (%)
641
 
0.1%
634
0.5%
624
0.5%
612
 
0.3%
605
0.7%
593
0.4%
587
0.9%
575
0.7%
563
0.4%
554
0.5%

Outcome
Categorical

CONSTANT
MISSING
REJECTED

Distinct1
Distinct (%)0.2%
Missing268
Missing (%)34.9%
Memory size6.1 KiB
0.0
500 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters1500
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0500
65.1%
(Missing)268
34.9%

Length

2022-09-18T17:21:17.336259image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Histogram of lengths of the category

Category Frequency Plot

2022-09-18T17:21:17.522726image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
ValueCountFrequency (%)
0.0500
100.0%

Most occurring characters

ValueCountFrequency (%)
01000
66.7%
.500
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1000
66.7%
Other Punctuation500
33.3%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
01000
100.0%
Other Punctuation
ValueCountFrequency (%)
.500
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common1500
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
01000
66.7%
.500
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII1500
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
01000
66.7%
.500
33.3%

Interactions

2022-09-18T17:21:11.841225image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:01.622883image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:03.195065image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:04.737438image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:06.150803image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:07.556426image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:09.054958image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:10.449063image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:12.001171image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:01.864406image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:03.468616image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:04.912919image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:06.322055image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:07.865245image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:09.225226image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:10.619039image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:12.284854image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:02.045728image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:03.648462image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:05.090214image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:06.495924image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:08.032973image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:09.405946image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:10.797514image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:12.453632image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:02.225248image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:03.830424image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:05.273414image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:06.669691image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:08.206572image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:09.584470image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:10.974627image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:12.633192image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:02.444661image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:04.006002image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:05.448704image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:06.848900image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:08.384974image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:09.772718image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:11.153363image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:12.809199image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:02.634156image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:04.175927image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:05.618542image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:07.020290image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:08.552021image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:09.938809image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:11.333630image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:12.975921image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:02.817687image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:04.357373image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:05.793526image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:07.187985image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:08.719253image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:10.106768image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:11.500750image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:13.145029image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:03.012684image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:04.565099image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:05.975329image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:07.368134image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:08.893472image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:10.275583image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
2022-09-18T17:21:11.676871image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Correlations

2022-09-18T17:21:17.674841image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-09-18T17:21:17.909298image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-09-18T17:21:18.117775image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-09-18T17:21:18.321208image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-09-18T17:21:13.380486image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
A simple visualization of nullity by column.
2022-09-18T17:21:13.581429image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2022-09-18T17:21:13.833575image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2022-09-18T17:21:14.028619image/svg+xmlMatplotlib v3.5.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
06.0148.072.00000035.00000079.79947933.6000000.62750.0NaN
11.085.066.00000029.00000079.79947926.6000000.35131.00.0
28.0183.064.00000020.53645879.79947923.3000000.67232.0NaN
31.089.066.00000023.00000094.00000028.1000000.16721.00.0
40.0137.040.00000035.000000168.00000043.100000NaN33.0NaN
55.0116.074.00000020.53645879.79947925.6000000.20130.00.0
63.078.050.00000032.00000088.00000031.0000000.24826.0NaN
710.0115.069.10546920.53645879.79947935.3000000.13429.00.0
82.0NaN70.00000045.000000NaN30.5000000.15853.0NaN
98.0125.096.00000020.53645879.79947931.9925780.23254.0NaN

Last rows

PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabetesPedigreeFunctionAgeOutcome
7581.0106.076.020.53645879.79947937.50.19726.00.0
7596.0190.092.020.53645879.79947935.50.278NaNNaN
7602.088.058.026.00000016.00000028.40.76622.00.0
7619.0170.074.031.00000079.79947944.00.40343.0NaN
7629.089.062.020.53645879.79947922.50.14233.00.0
76310.0101.076.0NaN180.00000032.90.17163.00.0
7642.0122.070.027.00000079.79947936.80.34027.00.0
7655.0121.072.023.000000112.00000026.20.24530.00.0
7661.0126.060.020.53645879.79947930.10.34947.0NaN
7671.093.070.031.00000079.79947930.40.31523.00.0